Skip to content

Fix excessive logging in create_pr.py that creates 43MB+ log files#2000

Open
mohammedahmed18 wants to merge 1 commit intomainfrom
fix/excessive-logging-create-pr
Open

Fix excessive logging in create_pr.py that creates 43MB+ log files#2000
mohammedahmed18 wants to merge 1 commit intomainfrom
fix/excessive-logging-create-pr

Conversation

@mohammedahmed18
Copy link
Copy Markdown
Contributor

Problem

Line 38 of create_pr.py logged all keys from the function_to_tests dictionary using list(function_to_tests.keys()). For large codebases like budibase (1012 functions), this creates massive log files (43MB+) with a single DEBUG statement printing thousands of function names.

Evidence

  • Trace ID: 3d2ad2f0-254a-4401-9c93-84f691acabf0 (43MB log, 534K lines)
  • Location: Line 533922 shows list of 1000+ function keys in single log entry
  • Impact: Affects 4/22 logs (18%) in recent optimization run
  • Size: Each occurrence adds ~100KB to log file

Root Cause

Debug logging statement at line 38 was designed for small projects but became problematic when used with monorepos containing hundreds of packages:

# Before (buggy):
logger.debug(f"[PR-DEBUG] function_to_tests keys: {list(function_to_tests.keys())}")

Fix

Changed to log only the count:

# After (fixed):
logger.debug(f"[PR-DEBUG] function_to_tests has {len(function_to_tests)} keys")

This reduces log output from ~100KB to ~50 bytes per call.

Testing

  • ✅ Added 2 regression tests in test_create_pr_logging_bug.py
  • ✅ Tests verify count is logged, not full key list
  • ✅ Tests verify log output stays under 10KB (vs 100KB+ before)
  • ✅ All existing tests pass
  • ✅ Linting passes (uv run prek)

Impact

  • Severity: MEDIUM (doesn't break functionality, but bloats logs)
  • Type: Systematic (reproducible on every optimization run with large codebases)
  • Benefits: Significantly reduces log file sizes for monorepo projects

Related Issues

Part of comprehensive bug-fixing effort for Codeflash optimization stability.

**Problem:**
Line 38 of create_pr.py logged all keys from function_to_tests dictionary
using `list(function_to_tests.keys())`. For large codebases like budibase
(1012 functions), this creates massive log files (43MB+) with a single
DEBUG statement printing thousands of function names.

**Root Cause:**
Debug logging statement was designed for small projects but became
problematic when used with monorepos containing hundreds of packages.

**Evidence:**
- Trace ID: 3d2ad2f0-254a-4401-9c93-84f691acabf0 (43MB log, 534K lines)
- Line 533922 shows list of 1000+ function keys in single log entry
- Affects 4/22 logs (18%) in recent optimization run
- Each occurrence adds ~100KB to log file

**Fix:**
Changed line 38 from:
  logger.debug(f"[PR-DEBUG] function_to_tests keys: {list(function_to_tests.keys())}")
to:
  logger.debug(f"[PR-DEBUG] function_to_tests has {len(function_to_tests)} keys")

This logs only the count instead of the full list, reducing log output
from ~100KB to ~50 bytes per call.

**Impact:**
- Severity: MEDIUM (doesn't break functionality, but bloats logs)
- Systematic: Reproducible on every optimization run with large codebases
- Benefits: Significantly reduces log file sizes for monorepo projects

**Testing:**
- Added 2 regression tests in test_create_pr_logging_bug.py
- Tests verify count is logged, not full key list
- Tests verify log output stays under 10KB (vs 100KB+ before)
- All existing tests pass
- Linting passes (uv run prek)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 5, 2026

Claude finished @mohammedahmed18's task in 2m 18s —— View job


PR Review Summary

  • Triage PR size — SMALL (1 line of production code changed)
  • Run lint/typecheck — prek passed; mypy has 4 errors in the new test file
  • Resolve stale threads — none found
  • Review code
  • Duplicate detection
  • Post summary

Prek Checks

uv run prek run --from-ref origin/mainpassed (no issues).

mypy reports 4 errors in the new test file (test_create_pr_logging_bug.py) — missing return type annotations and a missing type annotation for a dict variable. These are in test code and won't affect runtime behavior.


Code Review

The fix itself is correct. Changing list(function_to_tests.keys())len(function_to_tests) resolves the 43MB log issue.

However, there are two broader concerns:

1. The [PR-DEBUG] statements are temporary debug code — they should all be removed

codeflash/result/create_pr.py contains 18+ [PR-DEBUG]-prefixed logger.debug() calls. The [PR-DEBUG] prefix is a hallmark of ad-hoc debugging code that was never cleaned up. Fixing only line 38 leaves the rest in production:

  • Line 47 (loop): logger.debug(f"[PR-DEBUG] test_file: {tf.tests_in_file.test_file}, test_type={tf.tests_in_file.test_type}") — iterates over all test files for every function; still O(n) output
  • Lines 69–77 (loop): Two debug lines per registry entry, called inside existing_tests_source_for which can be called per-function
  • Lines 134–141: Per-invocation-id debug lines inside the main processing loop

The right fix is to remove all [PR-DEBUG] statements, not patch the worst one. They're noise in the debug log and contribute to bloat at scale.

Fix this →

2. Test file is overbuilt for a 1-line change

test_create_pr_logging_bug.py is 147 lines for a fix that just changes one log format string. The two tests are near-identical (large dict vs small dict) and both verify the same thing. The size assertion (< 10000 bytes) is also brittle — it ties test correctness to output byte counts that will vary with Python version and log handler configuration.

A simpler, more direct test would just assert the log message format directly using caplog.


Duplicate Detection

No duplicates detected. The changed function existing_tests_source_for has no equivalent in other modules.


Test Coverage

Tests pass. Coverage for the changed production line is confirmed by the added tests.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant